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SCIM  --  A  METER  FOR  MEASURING  THE  PERFORMANCE 
OF  SPEECH  COMMUNICATION  SYSTEMS 

ABSTRACT 

Two  major  theories  have  emerged  that  attempt  to  relate  in  a 
quantitative  way  the  physical  characteristics  of  a  speech  com¬ 
munication  system  and  the  intelligibility  of  speech  as  perceived 
by  a  crew  of  trained  listeners. 

The  more  widely  used  theory  is  the  basis  for  the  "Articulation 
Index"  (AI)  formulated  in  19^7  by  N.  R.  French  and  J.  C.  Stein¬ 
berg  of  the  Bell  Telephone  Laboratories.  This  theory  holds 
that  the  signal  (speech) -to-noise  ratio  (in  20  narrow  frequency 
bands)  of  a  communication  system  will,  when  properly  weighted 
and  summed,  provide  an  AI  value  that  is  directly  related  to  the 
intelligibility  of  speech  heard  over  that  system. 

A  second  theoretical  approach  to  this  problem  is  embodied  in  the 
Pattern  Correspondence  Index  (PCI)  machine  proposed  by  J.C.R. 
Licklider  in  1956.  This  instrument  determines  the  correlation 
in  the  frequency  and  time  domains  between  the  speech  input  to  a 
communication  system  and  the  speech  output  of  that  system. 

The  subject  of  the  present  paper  is  an  electronic  device  called 
SCIM  (Speech  Communication  Index  Meter)  that  is  designed  primarily 
in  terms  of  the  Articulation  Index  theory;  however,  SCIM  accom¬ 
modates  additional  factors  affecting  speech  that  are  not  involved 
in  the  original  procedures  for  calculating  AI.  The  SCIM  signal 
generator  generates  a  3-second  signal  burst,  which  is  fed  to  the 
electrical  input  of  the  system  under  test;  within  12  seconds  the 
SCIM  analyzer,  located  at  the  receiving  station,  calculates  and 
displays  an  index,  ranging  from  .00  to  .99^  which  reflects  the 
ability  of  that  system  to  transmit  intelligible  speech. 
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Although  devices  based  on  similar  principles  have  been  previously 
developed,  SCIM  measures  several  parameters  affecting  the  per¬ 
formance  of  a  speech  system  differently  than  previous  Instruments. 
SCIM  was  designed  to  be  relatively  small,  have  small  power  re¬ 
quirements,  be  capable  of  remote  digital  readout,  and  contain 
other  features  to  make  it  suitable  for  field,  fixed  station  or 
laboratory  use. 

The  results  of  SCIM  measurements  and  actual  speech  intelligibility 
tests  obtained  from  a  variety  of  comm\mlcation  systems  and 
operating  conditions  will  be  presented. 


REVIEW  AND  APPROVAL 
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SECTION  1 


INTRODUCTION 

The  valid  assessment  of  the  performance  of  speech  communica¬ 
tion  systems  has  always  been  a  difficult  problem.  The  re¬ 
quirements  for  assessment  have  been,  usually,  two:  (l)  as 
aid  and  guidance  in  the  design  of  electronic  components 
(microphones,  amplifiers,  transmitters,  receivers,  earphones, 
etc.)  and  (2)  the  objective  monitoring  of  the  performance  of 
communication  systems  during  actual  operations  or  under 
simulated  operational  conditions. 

The  task,  of  course,  has  been  made  difficult  because,  physi¬ 
cally,  the  speech  has  such  complex  and  dynamic  characteristics, 
and  secondly  because  the  information  content  of  the  speech 
signal,  as  perceived  by  the  human  listener,  is  not  always 
simply  related  to  the  physical  nature  of  the  speech  signal, 
at  least  Insofar  as  our  present  methods  of  physical  analysis 
would  indicate . 

Speech  Intelligibility  or  speech  articulation  tests  provide, 
when  properly  used,  a  reasonably  accurate  and  valid  procedure 
for  measuring  differences  in  the  performance  of  communications 
equipment  and  systems.  However,  these  psychological  tests, 
which  usually  require  laboratory  conditions  for  their  admin¬ 
istration  and  specially  trained  crews  of  talkers  and  listeners, 
are  costly  and  time-consuming,  and,  in  most  cases,  cannot  be 
readily  applied  to  the  measurement  of  speech  systems  under 
"real  life"  operational  conditions. 

Nevertheless,  from  30  odd  years  of  laboratory  research  with 
speech  intelligibility  and  articulation  tests,  there  has 
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emerged  a  rather  extensive  set  of  facts  about  the  relation 
between  measurable  physical  characteristics  of  the  received 
speech  signal,  including  the  noise,  if  any,  present  with  the 
speech  and  the  understandablllty  of  speech  as  perceived  by 
listeners .  Various  investigators  have  attempted  to  use  this 
knowledge  to  develop  methods  of  system  evaluation  that  avoid 
the  speech  testing  process  per  se,  but  yet  provide  a  measure 
that  is  Interpretable  in  terms  of  speech  understandablllty. 

One  outcome  of  this  latter  effort  has  been  the  development 
of  methods  for  calculating  from  physical  measurements  of  a 
communication  system  what  the  intelligibility  of  speech 
should  be  over  that  system;  a  secondary  outcome  has  been  the 
design  of  electronic  test  devices  which,  when  applied  to  a 
speech  communication  system,  automatically  evaluate  the  per¬ 
formance  of  that  speech  system. 

The  purposes  of  the  present  paper  are:  (l)  to  outline  briefly 
the  theory  and  scientific  basis  for  the  design  of  one  such 
test  instrument  (called  the  Speech  Communication  Index  Meter, 
"SCIM")  for  measuring  the  performance  of  speech  communication 
systems;  (2)  a  description  of  the  SCIM  device;  and  (3)  a 
comparison  of  the  measurements  achieved  by  the  SCIM  instrument 
for  a  variety  of  speech  communication  systems  with  the  results 
of  speech  intelligibility  tests  obtained  for  the  same  systems. 


SECTION  2 


GENERAL  RELATIONS  BETWEEN  INTELLIGIBILITY  AND  PHYSICAL 
CHARACTERISTICS  OF  SPEECH 

Two  major  theories  have  emerged  that  attempt  to  relate  in  a 
quantitative  way  the  physical  characteristics  of  a  speech 
communication  system  and  the  intelligibility  of  speech  as 
perceived  by  a  crew  of  trained  listeners. 

The  more  widely  used  theory  is  the  basis  for  the  "Articulation 
Index"  (AI)  formulated  in  19^7  by  N.  R.  French  and  J.  C. 
Steinberg^  of  the  Bell  Telephone  Laboratories.  This  theory 
holds  that  the  speech-to-noise  ratio,  in  20  narrow  frequency 
bands,  of  a  communication  system  will  provide  an  AI  value 
that  is  directly  related  to  the  intelligibility  of  speech 
heard  over  that  system. 

A  second  theoretical  approach  to  this  problem  is  embodied 
in  the  Pattern  Correspondence  Index  (PCi)  machine  proposed 
by  J.  C.  R.  Licklider  in  1956.  This  instrument  determines 
the  correlation  in  the  frequency  and  time  domains  between 
the  speech  input  to  a  communication  system  and  the  speech 
output  of  that  system. 

SCIM  is  designed  primarily  in  terms  of  the  Articulation  Index 
theory;  however,  SCIM  accommodates  additional  factors  affect¬ 
ing  speech  that  are  not  involved  in  the  original  procedures 
for  calculating  AI. 
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In  order  to  explain  the  Articulation  Index,  it  is  necessary 
to  describe  some  of  the  general  characteristics  of  the  speech 
signal  and  of  masking  by  noise. 

Figure  1  shows  the  variations  in  the  pressure  of  conversa¬ 
tional  speech  measured  over  short,  l/8-second  Intervals  of 
time  (l/8  second  was  chosen  because  it  represents  the  average 
duration  of  individual  speech  sounds)  as  a  function  of  fre¬ 
quency.  The  graph  is  plotted  in  terms  of  the  pressure  per 
cycle;  we  see  that  speech  contains  energy  over  the  frequency 
range  from  at  least  100  to  7000  cps  and  that  the  range  of 
pressures  is  in  excess  of  30  dB  over  this  entire  frequency 
range.  This  range  of  pressures  is  the  range  found  among 
Individual  speech  sounds  when  the  speech  is  uttered  at  a 
supposedly  normal  constant  level  of  effort.  It  does  not 
represent  the  range  between  weak  talking  or  shouting,  or 
amongst  different  talkers .  This  figure  demonstrates  the 
complex  and  dynamic  nature  of  the  speech  signal . 

Masking .  A  factor  which  enters  into  the  calculation  of  AI 
is  the  masking  effect  of  noise;  noise  containing  sufficient 
energy,  at  a  given  point  on  the  frequency  scale,  can  mask  or 
prevent  the  reception  of  speech  components  not  only  in  the 
same  frequency  region,  but  also  frequency  components  of 
speech  that  lie  above  and  below  the  frequency  components  of 
the  noise.  Upward  spread-of-masklng,  in  which  a  low-frequency 
tone  or  noise  masks  a  higher  frequency  portion  of  the  speech 
spectrum,  is  far  more  severe  than  downward  spread-of-masking, 
in  which  the  masking  tone  is  higher  in  frequency  than  the 
signal  of  interest.  Figure  2  illustrates  the  effect  of  this 
"spread-of-masklng"  phenomenon  upon  the  perception  of  pure 
tones . 
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FIG.l  SHOWING  FOUR  CURVES  RELATING  DIFFERENT 
MEASURES  OF  SPEECH  LEVEL  TO  FREQUENCY 
(AFTER  DUNN  AND  WHITE^) 


THRESHOLD  SHIFT  IN  DB 
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FREQUENCY  IN  CYCLES  PER  SECOND 

FIG. 2  THRESHOLD  SHIFT  DUE  TO  A  350-450  CPS  MASKING 
NOISE  AT  85  DB  OVERALL  SOUND  PRESSURE  LEVEL 
(AFTER  CARTER  AND  KRYTER^) 


Figure  2  shows  that  a  narrow  band  of  noise  centered  at  400  cps 
would  mask  information  around  800  cps  as  effectively  as  a 
band  of  noise  centered  at  800  cps  but  of  l8  dB  less  amplitude 
than  the  band  centered  at  400  cps;  it  would  also  mask  infor¬ 
mation  at  1600  cps  to  much  the  same  degree  as  would  a  -30  dB 
band  of  noise  located  at  16OO  cps. 

Calculation  of  the  Articulation  Index.  As  previously  mentioned 
the  AI  of  a  speech  communications  link  is  normally  calculated 
by  measuring  the  speech-to-nolse  (S/N)  ratio  in  a  number  of 
frequency  bands;  the  AI  is  defined  as  the  weighted  sum  of 
these  S/N  ratios.  There  are  two  fundamental  restrictions 
placed  upon  the  calculated  AI  --  namely^  that  it  can  never  be 
less  than  0.00  and  that  it  may  never  exceed  1.00.  To  implement 
the  former  restriction^  we  assume  that  all  measured  S/N  ratios 
less  than  -12  dB  are  equal  to  -12  dB;  and,  since  the  band¬ 
weighting  functions  are  derived  so  that  the  AI  equals  1.00 
when  the  S/N  ratio  in  all  bands  is  +18  dB,  we  assume  that  all 
S/N  ratios  in  excess  of  +18  dB  are  equal  to  +18  dB.  Figure  3 
Illustrates  the  general  concept  and  work  sheet  utilized  for 
the  calculation  of  AI.  In  Fig.  3  the  speech  spectrum  is 
divided  into  20  narrow  bands;  these  bands  have  been  chosen 
so  that  each  band  contributes  equally  to  speech  intelligibility 
Also,  it  is  seen  that  speech  at  very  weak  and  very  intense 
levels  does  not  contribute  to  Intelligibility  because  it  ex¬ 
ceeds  the  limits  of  the  normal  ear. 

On  Fig.  3}  we  have  sketched  the  speech  spectrum  that  might  be 
present  at  the  listener's  ear  from  a  typical  speech  system 
along  with  an  octave  band  of  masking  noise  that  is  also  pre¬ 
sumed  to  be  mixed  with  the  speech  at  the  listener's  ear.  Note 
that  the  masking  spectrum,  due  to  the  aforementioned  upward 
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WORK  SHEET  FOR  THE  CALCULATION  OF  THE  ARTICULATION  INDEX  (Al). 
SPEECH  SPECTRUM  PLOTTED  IS  THE  LONG-TERM  RMS  OF  SPEECH  PLUS  12DB, 
THE  DASHED  CURVE  IS  THE  S  PREAD-OF-MASKI NG  TO  BE  EXPECTED  FROM 
^E  ^TAj^B/^  O^H^uy^  (^E  y;. 


FIG.  3 


spread-of -masking  of  the  noise  exceeds,  at  the  higher  fre- 

2 

quencies  the  actual  noise  spectrum.  The  AI  is  calculated 
by  taking  the  S/N  ratios  between  the  plotted  speech 
spectrum  (long-term  rms  plus  12  dB)  and  the  noise  or  mask¬ 
ing  spectrum,  whichever  is  the  greater  for  each  of  the 
20  bands,  weighting  proportionately  each  S/N  ratio  so  that 
+30  dB  equals  1.00  and  0  dB  equals  0,  summing  the  result 
and  dividing  by  20. 

Frequency  shift.  The  AI  must  be  corrected  for  factors  other 
than  noise  that  affect  intelligibility.  One  such  factor  is 
frequency  shift.  If  a  single  sideband  ratio  receiver  is  not 
tuned  precisely  to  the  carrier  frequency  of  the  transmitter, 
the  detected  speech  signal  will  appear  shifted  upwards  or 
downwards  by  an  amount  equal  to  the  mismatch  between  the 
transmitted  carrier  frequency  and  the  receiver  local  oscil¬ 
lator  frequency.  While  upward  shifts  in  the  speech  spectrum 
of  100  cps  or  more  can  be  tolerated,  a  downward  shift  of 
50  cps  can  have  a  devastating  effect  upon  the  intelligibility 
of  the  speech. 

The  correction  factor  to  be  applied  because  of  the  presence  of 
frequency  shift  will  be  shown  in  a  later  figure . 

Peak  clipping.  Peak  clipping  is  still  another  factor  which 
enters  into  the  calculation  of  AI.  Figure  4  shows  the  in¬ 
crease  (over  undipped  speech  of  the  same  peak  amplitude)  in 
the  long-term  rms  pressure  of  peak-clipped  speech  as  a  func¬ 
tion  of  the  amount  of  clipping  and  subsequent  reamplif icatlon 
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DB  INCREASE  IN  LONG  TERM  RMS  PRESSURE  OF 
OF  PEAK-CLIPPED  SPEECH  RELATIVE  TO  LONG 
TERM  RMS  PRESSURE  OF  UNCLIPPED  SPEECH 
OF  EQUAL  PEAK  AMPLITUDE 


AMOUNT  OF  PEAK-CLIPPING  AND  POST-CLIPPING  AMPLIFICATION  IN  OB 

(PEAK  AMPLITUDE  DEFINED  BY  0.001  PROBABILITY  LEVEL 
IN  DISTRIBUTION  OF  INSTANTANEOUS  AMPLITUDES) 


FIG. 4  SHOWING  THE  INCREASE  IN  RMS  SPEECH 
POWER  AS  A  FUNCTION  OF  CLIPPING 
WHEN  CLIPPED  LEVEL  IS  RAISED  TO 
CLIPPING  REFERENCE  LEVEL  (AFTER 
WATHEN-DUNN  AND  LIPKEM 
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required  to  restore  the  undipped  peak  levels.  Peak  clipping 
must  be  taken  into  account  when  calculating  AI  because  the  Intel¬ 
ligibility  of  speech  is,  to  a  first  approximation,  proportional 
to  the  long-term  rms  slgnal-to-noise  ratio  and  not  to  the  peak 
S/N  ratio. 

Validity  of  AI.  The  validity  of  the  AI  concept  has  been  well 
established  for  a  wide  variety  of  speech  communication  systems. 
Some  of  the  results  of  the  application  of  the  AI  calculation 
procedure  to  speech  systems  for  which  speech  Intelligibility 
test  scores  were  available  are  presented  In  Figs.  5  and  6. 

Figure  7  shows  the  relation  between  AI  and  different  types  of 
speech  test  materials.  It  is  to  be  noted  that  the  percentage 
of  test  Items  correctly  perceived  Is  dependent  not  only  on  the 
speech  material  --  nonsense  syllables,  words,  or  sentences  -- 
but  also  the  size  of  the  message  set,  whether,  for  example,  the 
test  vocabulary  consisted  of  a  1000,  256,  or  32  "PB"  {so-called 
phonetically  balanced)  words. 
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FIG. 5  COMPARISON  OF  OBTAINED  AND 
PREDICTED  TEST  SCORES  FOR 
BROAD-BAND  SPEECH  IN  THE 
PRESENCE  OF  NARROW  BANDS 
OF  NOISE  SET  AT  VARIOUS 
INTENSITY  LEVELS  (AFTER  MILLER*) 
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FIG. 6  COMPARISON  OF  OBTAINED  AND 
PREDICTED  TEST  SCORES  FOR 
SPEECH  PASSED  THROUGH  A 
BANDPASS  FILTER  AND  HEARD  IN 
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SECTION  3 
SCIM 


Although  the  AI  procedure  Is  objective,  it  requires  tedious 
calculations  from  physical  data  about  speech  systems  that  are 
not  always  easily  obtained. 

The  Speech  Communication  Index  Meter  is  an  electronic  device 
which  automatically  calculates,  albeit  in  a  somewhat  simplified 
fashion,  the  AI  of  any  given  speech  system.  There  has  been  at 
least  one  previous  instrument  built  on  similar  principles. 
General  Electronics  Laboratories  Inc.  of  Cambridge,  Massachu¬ 
setts,  designed  and  built  for  the  U.  S.  Army  an  instrument 
called  the  Voice  Interference  Analysis  Set  (VIAS)  which  also 
estimates  the  AI  of  speech  systems.  VIAS  differs,  however,  in 
a  number  of  respects  from  SCIM,  both  in  the  type  of  signals 
utilized  for  system  testing  and  in  the  processing  and  analysis 
of  these  signals. 

Signal  generator.  The  SCIM  signal  generator,  shown  in  Figs. 

8  and  9^  located  at  the  transmitting  terminal  of  the  link 
under  test  normally  presents  for  transmission  a  continuous 
1  kc  calibration  tone.  When  the  START  button  is  depressed, 
the  signal  generator  synchronizes  itself,  resulting  in  a  maxi¬ 
mum  delay  of  one  second  before  the  beginning  of  the  test 
sequence  shown  in  Fig.  10. 

The  sequence  begins  with  the  replacement  of  the  1000  cps  cali¬ 
bration  tone  by  a  30-millisecond  sync  burst.  The  burst  consists 
of  the  sum  of  a  600  cps  sine  wave  and  a  2000  cps  sine  wave;  the 
function  of  the  burst  is,  of  course,  to  signal  the  analyzer  that 
the  test  sequence  has  begun. 

Following  the  sync  burst  there  appears  a  "speech"  signal,  pro¬ 
duced  by  square-wave  modulating  a  shaped-spectrum  random  noise 


-15- 


FIG.  8 
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SIMPLIFIED  BLOCK  DIAGRAM  OF  SIGNAL  GENERATOR 
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FIG. 10  TEST  SIGNAL  SEQUENCE 


source.  Prior  to  modulation  the  noise  is  tailored^  by  means  of 
a  spectrum-shaping  filter,  to  have  an  amplitude-versus-frequency 
characteristic  which  closely  approximates  the  long-term  root- 
mean-square  spectrum  of  speech.  Modulation  is  performed  at  a 
5  cps  rate  by  means  of  a  time-varying  attenuator  which  provides 
an  on-to-off  attenuation  ratio  of  17  dB.  Since  the  modulation 
duty  cycle  is  50^,  the  high-level  and  low-level  portions  of  the 
"speech"  signal  each  last  for  1/10  second. 

At  the  end  of  one  second,  the  "speech"  signal  transmission  is 
terminated  and  a  3000  cps  tone  appears  instead.  A  30-mllllsecond 
sync  burst  is  used  to  mark  the  temporal  boundary  between  the 
"speech"  and  the  tone.  While  the  3000  cps  tone  is  on,  measure¬ 
ments  are  made  by  the  analyzer  of  the  background  noise,  in  the 
frequency  range  250-1650  cps,  of  the  communication  system  under 
test;  the  tone  serves  to  keep  constant  the  gain  of  any  AGO 
equipment  which  may  be  part  of  the  system  being  tested.  After 
one  second  has  elapsed,  the  3000  cps  tone  is  replaced  by  a 
400  cps  tone,  which  serves  the  same  purpose  but  allows  system 
noise  measurements  to  be  made  in  the  1650-4200  cps  portion  of 
the  spectrum. 

As  before,  a  30-mllllsecond  sync  burst  delimits  the  end  and 
beginning,  respectively,  of  the  3000  and  400  cps  AGO  tones. 

At  the  end  of  this  third  one-second  Interval  a  sync  burst  is 
presented  and  the  signal  generator  output  reverts  to  the 
1000  cps  calibration  tone,  which  will  continue  until  the  next 
time  that  the  START  button  is  depressed. 

Analyzer.  Located  at  the  receiving  terminal,  the  SCIM  analyzer 
(shown  in  Figs.  11  and  12)  filters  the  received  "speech"  or 
noise  spectrum  into  nine  bands,  which  together  cover  the  fre¬ 
quency  range  from  250  cps  to  4200  cps.  The  lower-skirt  slope 
of  all  filters  is  +I8  dB/octave,  the  upper-skirt  -60  or  more 
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FIG.  12 


SIMPLIFIED  BLOCK  DIAGRAM  OF  ANALYZER 


dB  per  octave;  these  slopes  were  chosen  specifically  to  approxi¬ 
mate  the  inverse  of  the  idealized  spread-of-masking  curve.* 

The  output  of  each  filter  is  detected  and  fed  to  a  "speech"  and 
a  noise  integrator.  The  integrators  are  of  the  operational- 
amplif ier/feedback-capacitor  type,  and  provide  true  time  inte¬ 
gration  of  the  unipolar  detector  output  signals.  The  "integrate/ 
hold"  and  "clear"  functions  are  controlled  by  reed  relays  mounted 
directly  on  each  Integrator  printed  circuit  board. 

The  "speech"  integrators  are  enabled  during  the  one-second  in¬ 
terval  in  which  the  "speech"  signal  is  produced  by  the  signal 
generator.  At  the  end  of  this  one-second  period,  integration 
ceases  and  the  final  value  of  the  integral  is  held.  The  value 
stored  in  the  "speech"  integrators,  it  should  be  noted,  is 
really  the  integral  of  the  detected  "speech"  plus  system  noise. 

During  the  following  one-second  interval,  the  noise  integrators 
for  the  first  five  bands  are  enabled.  Since  no  "speech"  is 
transmitted  at  this  time,  the  final  voltage  stored  in  these 
integrators  is  proportional  to  the  noise  alone.  The  frequency 
of  the  AGO  tone  during  this  interval,  the  reader  will  recall, 
is  3000  cps;  this  frequency  is  greatly  attenuated  by  all  of 
the  filters  in  bands  1  through  5^  so  that  the  presence  of  the 
tone  does  not  affect  the  measurements  made  in  these  bands. 

At  the  end  of  this  second  one-second  interval,  the  band  1-5 
noise  integrators  are  switched  to  the  "hold"  mode,  and  in  the 


*  The  spread-of-masking  fxmction  is  dependent  upon  both  the  fre¬ 
quency  and  level  of  the  masking  noise;  for  the  purposes  of  SCIM 
we  have  chosen  an  approximation  which  in  our  opinion,  is  suitable 
for  use  with  typical  communications  systems.  This  idealized 
spread-of-masking  curve  falls  off  above  the  frequency  of  the 
masking  noise  at  a  rate  of  I8  dB  per  octave ;  the  masking  effect 
at  frequencies  below  that  of  the  masking  noise  is  considered  to 
be  negligible. 
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third  one-second  interval  the  noise  integrators  associated  with 
bands  6  through  9  are  enabled. 

After  the  entire  3-second  sequence  has  been  completed,  the  an¬ 
alyzer  enters  its  "calculating"  mode.  At  this  time,  the  outputs 
of  the  band  1  (250-500  cps)  "speech"  and  noise  integrators  are 
connected,  by  means  of  a  commutating  switch,  to  two  logarithmic 
converters.  Each  log  converter  is  a  temperature-stabilized 
diode  network  which  delivers,  at  its  output,  a  d.c.  voltage 
proportional  to  the  logarithm  of  the  d.c.  voltage  of  its  input. 


The  outputs  of  the  two  log  networks  are  compared  by  a  very 
stable  d.c.  differential  amplifier.  The  output  of  this  ampli¬ 
fier  is  proportional  to  (log ["speech"  +  noise])  -  ( log [noise ]) , 
which  may  be  rewritten  as 


log 


"speech"  +  noise 

noise 


log 


S  +  N 
N 


A  diode  correction  network,  having  the  transfer  characteristic 
®out  =  K  log  4  (10^^""  -  1) 

effects  the  transformation  of  log  (S+N/N)  into  log  (S/N)  +  12  dB, 
as  required  by  the  method  for  AI  calculation. 


The  d.c.  output  voltage  of  the  diode  network  is  fed  via  a  unity- 
gain  buffer  amplifier  to  a  voltage-to-frequency  converter.  This 
device  accepts  a  d.c.  voltage  at  its  input  and  outputs  a  pulse 
train,  the  PFJP  (pulse  repetition  frequency)  of  which  is  directly 
proportional  to  the  applied  voltage.  Thus,  in  a  fixed  time,  the 
converter  generates  a  number  of  pulses  directly  proportional  to 
the  S/N  ratio.  These  pulses  are  counted  by  a  three-place  decade 
counter  whose  displays  (of  the  two  higher-order  digits  only) 
constitute  the  readout  display  of  the  analyzer. 
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After  dwelling  for  one  second  on  the  outputs  of  the  band  1 
"speech"  and  noise  integrators,  the  commutator  is  advanced 
one  step,  thereby  connecting  the  outputs  of  the  corresponding 
band  2  integrators  to  the  two  log  converters.  After  another 
one-second  dwell  Interval  has  elapsed  (during  which  time  the 
decade  counter  adds  to  its  previous  total  a  number  proportional 
to  the  S/N  ratio  in  band  2)  the  commutator  examines  the  band  3 
integrators,  etc.  When  the  commutator  comes  to  rest  in  posi¬ 
tion  10,  the  S/N  ratios  of  all  nine  analyzer  bands  have  been 
totalized;  the  number  displayed  is  the  Speech  Communication 
Index  (SCl),  and  is  defined  as 


9 


i=l 


The  theory  behind  the  Articulation  Index  requires  that,  in  the 
calculation  of  AI,  all  S/N  ratios  in  excess  of  +18  dB  be  con¬ 
sidered  equal  to  +l8  dB.  The  reason  for  this  boundary  condi¬ 
tion  is  simply  that  the  contribution  to  intelligibility  of  any 
given  band  Improves  as  the  S/N  ratio  improves  and  reaches  its 
maximum  value  at  an  S/N  ratio  of  +l8  dB.  Larger  S/N  ratios, 
therefore,  cannot  further  Improve  the  intelligibility,  and 
should  not  contribute  to  the  total  AI  in  an  amount  greater 
than  that  corresponding  to  a  +l8  dB  S/N  ratio. 

The  AI  theory  also  requires  that  S/N  ratios  less  than  -12  dB 
(corresponding  to  zero  contribution  to  intelligibility)  be 
considered  equal  to  -12  dB.  The  reasoning  is,  of  course,  just 
the  converse  of  the  argument  given  above  for  the  limiting  of 
S/N  ratios  in  excess  of  +l8  dB. 

The  boundary  conditions  described  above  are  imposed  in  the 
analyzer  by  the  d.c.  differential  amplifier.  Its  circuitry  is 
such  that  its  output  cannot  rise  above  +l8  volts.  Since  this 
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output  level  has  a  scale  factor  of  1  dB/volt,  an  l8  volt  output 
corresponds  to  an  l8  dB  S+N/N  [or  (S/N)]  ratio;  after  the 
"addition"  of  12  dB  by  the  diode  correction  network,  this  cor¬ 
responds  to  the  upper  limit  of  the  required  30  dB  range. 

The  amplifier  also  serves  to  impose  the  lower  boundary  limit; 
since  its  output  must  always  be  >  0  volts  (i.e.,  >  0  dB). 

A  complete  measurement  run,  including  three  seconds  of  "speech" 
and  AGO  tone  transmissions  and  nine  seconds  of  "calculation" 
time  may  take  as  little  as  11  seconds,  since  the  first  second 
of  calculation  time  can  be  coincident  with  the  third  second  of 
transmission  from  the  signal  generator. 

Analyzer  sync  detector.  The  analyzer  sync  detection  system  con¬ 
sists  of  two  filters,  centered  at  600  cps  and  2  kc,  respectively, 
whose  outputs  are  detected  and  ANDed  together.  Whenever  both 
sync  tones  are  found  to  be  present,  a  relay  is  actuated  which 
advances  a  cam-operated  rotary  stepping  switch.  This  switch, 
in  turn,  provides  sequentially  the  commands  for  the  analyzer  to 
clear  all  integrators,  integrate  "speech,"  integrate  noise, 
begin  calculation,  etc. 

The  analyzer  control  system  also  contains  a  timing  mechanism 
for  the  commutator,  which  sequentially  connects  the  logarithmic 
attenuator  inputs  to  the  outputs  of  successive  pairs  of  inte¬ 
grators  . 

Prior  to  an  actual  measurement  run,  the  frequency  of  the  1  kc 
calibration  tone  is  measured  by  means  of  a  special  discriminator 
filter.  The  filter  output  is  detected  and  compared  with  the 
detected  input  to  the  discriminator.  The  difference  between 
these  two  d.c.  voltages  is  a  measure  of  the  mismatch  between 
the  transmitter  carrier  frequency  and  the  receiver  local 
oscillator  frequency. 
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A  d.c.  servo  loop  utilizes  this  "tuning  error"  voltage  and  sets 
a  motor-driven  servo-potentiometer  according  to  the  degree  of 
tuning  error  (frequency  shift)  present.  The  servo-potentiometer 
attenuates  the  d.c.  voltage  fed  to  the  voltage-to-frequency 
converter,  and  thereby  directly  multiplies  the  AI  by  some  num¬ 
ber  <  1.00.  The  shape  of  the  discriminator  filter  is  adjusted 
so  that  the  resulting  decrement  in  AI  is  in  accordance  with  the 
data  in  Fig.  13. 

Operator  requirements.  Ease  of  operation  of  the  SCIM,  in  the 
field  and  by  untrained  personnel,  has  been  given  major  considera¬ 
tion;  the  only  front-panel  controls  on  the  signal  generator  are 
a  power  on/off  switch,  a  metered  level  control,  and  a  press-to- 
start  switch.  The  generator  is  internally  calibrated  so  that, 
once  the  level  of  the  1000  cps  calibration  tone  is  established 
and  the  START  button  is  pressed,  the  various  signal  sources 
within  the  generator  will  be  presented  to  the  input  of  the  com¬ 
munication  link  under  test  in  the  proper  sequence,  for  the 
proper  durations  and  at  the  required  levels  with  no  further 
attention  from  the  operator. 

The  controls  of  the  analyzer  are  equally  simple  -  a  power  on/off 
switch,  a  metered  level  control  and  a  manual  reset  switch.  The 
operator  need  only  set  the  level  of  the  incoming  1000  cps  cali¬ 
bration  cone  to  0  VU  on  the  meter.  The  manual  reset  switch  need 
be  used  only  after  power  is  first  applied  to  the  analyzer  or 
when  a  severe  signal  dropout  results  in  loss  of  synchronization 
during  a  measurement  run.  Under  normal  circumstances  the 
analyzer  will  be  ready  for  a  second  measurement  run  immediately 
upon  completion  of  the  first.  It  will  continue  to  display  its 
last  calculated  AI  indefinitely,  unless  power  is  removed  or 
another  run  is  begun.  The  operator  may,  therefore,  read  the 
calculated  AI  from  the  digital  display  at  his  leisure. 
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SCI  MULTIPLIER  vs.  FREQUENCY  SHIFT 


SECTION  4 
TEST  RESULTS 

There  are  tv;o  criteria  that  can  be  used  for  evaluating  the 
accuracy  and  reliability  of  SCIM  —  (l)  how  well  "SCI 's'*  for 
a  variety  of  speech  systems  agree  with  AI's  calculated  in 
accordance  with  the  20-band  method  for  calculated  AIj  (2)  how 
well  the  SCI's,  for  the  systems  being  evaluated,  predict  the 
intelligibility  of  speech  as  measured  with  typical  psycho¬ 
acoustic  test  methods. 

SCI  vs  AI.,  Figure  14  Illustrates  the  agreement  betvjeen  cal¬ 
culated  AI's  and  SCI's  as  measured  by  SCIM  when  the  S/N  ratio 
present  in  an  otherwise  "high-fidelity"  speech  system  was 
systematically  varied.  It  is  seen  in  this  figure  that  except 
for  values  below  about  0.4,  SCIM  estimates  the  calculated  AI 
within  +  .02  of  its  true  value;  below  0.4  the  disagreement 
Increases  to  approximately  +  .04.  Thus,  for  at  least  the 
system  and  noise  conditions  represented  in  Fig.  14,  an  average 
of  one,  or  perhaps  two,  readings  of  SCIM  would  provide  an  SCI 
that  has  nearly  exactly  the  same  AI  as  would  be  found  by  the 
standard  calculation  procedures. 

SCI  vs  speech  test  scores.  A  series  of  psychoacoustic  tests 
were  conducted  on  a  wide  variety  of  speech  systems  in  order 
to  determine  the  relation  betv;een  SCI  and  measured  speech 
Intelligibility.  For  these  tests  a  listening  crew  of  eight 
high  school  and  college  students  were  used.  The  speech  tests, 
"Modified  Rhyme"  (MR)^  v/ere  recorded  at  the  output  of  the 
various  systems  and  noise  conditions  to  be  evaluated;  these 
recordings  were  then  administered  to  the  listening  crew  in 
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accordance  with  an  experimental  design  that  more  or  less  ran¬ 
domized  the  order  In  which  the  different  systems  and  conditions 
were  tested.  Two  50-word  MR  for  each  of  two  male  talkers  were 
given  at  each  test  condition.  The  listeners  wore  earphones  and 
were  seated  in  a  soundproofed  room.  All  noise  conditions  were 
achieved  by  electrically  mixing,  at  the  output  of  the  speech 
transmission  system,  the  speech  with  a  white  noise  that  had 
been  passed  through  a  4400  cps  low-pass  filter.  The  slgnal- 
to-nolse  ratios  (S/M)  were  measured  on  a  VU  meter.  The  peak 
VU  meter  readings,  in  the  absence  of  the  masking  noise,  were 
averaged  for  a  sample  of  the  test  words.  The  speech  signal 
was  then  turned  off  and  the  level  of  the  noise  adjusted  until 
the  VU  meter  had  the  same  value  as  the  average  foxuid  for  the 
peaks  of  speech  in  the  absence  of  the  noise;  the  S/N  was  then 
said  to  be  0  dB,  if  the  noise  exceeded  the  average  speech 
peaks  by  5  dB,  the  S/M  was  called  -5  dB,  and  so  forth. 

Five  SCI's  were  measured  with  SCIM  for  each  of  the  test  con¬ 
ditions.  The  results  of  the  speech  tests  and  the  SCI  measure¬ 
ments  are  presented  in  Table  1,  and  the  averages,  for  most  of 
these  results,  in  Fig.  15. 

Figure  15  shows  that  (except  for  certain  of  the  75  cps  fre¬ 
quency  shift  with  peak  clipping  conditions)  the  measured  SCI 
values  predict  the  test  scores  within  +  about  10  percentage 
points.  This  amount  of  deviation  (when  leaving  out  some  of 
the  75  cps  frequency  shift  conditions)  compares  reasonably 
well  with  the  range  of  differences  between  calculated  AI  and 
measured  intelligibility,  as  shown  in  previous  studies  given 
in  Figs.  5  and  6. 
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TABIZ  1 


Condition  S/N,  db 

Number  Note  1 

Frequency 
Shift,  cps 

Peak  Clip¬ 
ping,  dB 

Low-Pass  Filter 
Cutoff  Frequency 
Kc  -  Note  2 

Average  of 

2  MR  Tests 
Talker  CW 

Average  of 
2  MR  Tests 
Talker  AH 

Average  of 

4  MR  Tests 
Two  Talkers 

Average  of 

5  SCI 
Readings 

Maximum  Deviation 
of  any  SCI  Read¬ 
ing  from  Average 

1 

+15 

0 

0 

5 

94.5 

79.4 

87.0 

.71 

+.01 

2 

+  3 

0 

0 

5 

74.0 

66.0 

70.0 

.35 

+.01 

3 

-  3 

0 

0 

5 

50.6 

53.1 

51.8 

.27 

+.01 

4 

0 

0 

0 

5 

96.7 

88.3 

92.5 

.84 

+.08 

5 

+1$ 

0 

0 

3 

86.0 

75.0 

80.5 

.79 

+.01 

6 

+15 

0 

0 

1.5 

73.2 

70.3 

71.8 

.57 

+.01 

7 

+15 

+75 

0 

5 

93.7 

78.7 

86.2 

.64 

+.01 

8 

+15 

-75 

0 

5 

91.0 

77.3 

84.1 

.44 

-.03 

9 

+  3 

0 

0 

3 

65.7 

63.7 

64.7 

.37 

+.02 

10 

+  3 

0 

0 

1.5 

69.0 

55.8 

62.4 

.39 

+.02 

11 

+  3 

+75 

0 

3 

63.7 

64.6 

64.2 

.46 

+.01 

12 

+  3 

+75 
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Note  1  -  All  noises  were  white  noise,  4400  cps  low-pass  filtered  (-36  dB/octave)  except  those  at  a  O  dB  S/N  ratio, 
ratios  was  speech  (continuous  discourse  by  a  single  talker  (C.W. ]). 


Note  2  -  Low-pass  filters  had  -36  dB/octave  slopes  above  cutoff  frequency. 


The  masking  source  at  O  dB 


PERCENT  MODIFIED  RHYME  TEST  WORDS  CORRECT 


FIG. 15  SHOWING  PERCENT  MR  TEST  SCORES  AS  A  FUNCTION 
OF  SCI  FOR  LABORATORY  SPEECH  SYSTEMS,  AF  H-F 
SSB,  AND  MICROWAVE  RADIO  SYSTEMS.  (SEE  TABLE  1) 


It  is  not  possible  to  determine,  from  these  tests,  the  source 
of  the  "error"  causing  this  range  of  differences  --  whether 
it  lies  with  SCIM  or  with  the  speech  tests.  The  speech  tests 
themselves  have  an  unreliability,  for  the  number  here  admin¬ 
istered,  not  much  smaller  than  the  range  of  differences  found 
betv/een  SCI  and  measured  intelligibility. 

However,  SCI  can  be  expected  to  represent  only  an  approxima¬ 
tion  to  what  the  true  capability  of  a  system  is  for  trans¬ 
mitting  speech.  For  one  thing,  the  use  of  only  nine  filters 
in  the  analysis  set  does  not  provide  the  fineness  of  spectrum 
analysis  that  would  be  necessary  to  Insure  extreme  accuracy, 
and  the  other  characteristics  of  the  SCIM  system  (designed 
to  correct  or  account  for,  in  addition  to  frequency-amplitude 
irregularities  in  a  transmission  system,  spread-of -masking 
effects,  frequency  shift,  and  amplitude  distortion)  represent 
somewhat  idealized  engineering  averages  for  evaluating  the 
effects  of  the  variables  Involved. 

All  in  all,  it  would  appear  that,  except  for  certain  condi¬ 
tions  involving  frequency  shifts  in  excess  of  60  cps  or  so 
when  combined  with  peak  clipping,  SCIM  gives  an  index  measure¬ 
ment  that  can  be  used  for  evaluating  a  wide  variety  of  speech 
systems  and  listening  conditions  that  is  nearly  as  "accurate" 
as  calculated  AI  and/or  measured  percent  of  several  MR  speech 
tests  administered  to  a  crew  of  8  trained  listeners. 
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SECTION  5 

FUTURE  DEVELOPMENTS 

We  are  planning  to  make  certain  additions  to  SCIM  In  order  to 
make  It  a  more  general  and  useful  Instrument: 

1.  A  provision  will  be  provided  In  future  Instruments 
for  a  digital  printout  so  that  a  printed  record  of  the  SCI 
value,  time  of  day,  date,  etc.,  will  be  provided  either  with, 
or  remotely  from,  the  SCIM  analyzer  unit. 

2.  Acoustic  couplers  ("artificial  heads")  will  be 
developed  for  use  with  SCIM.  The  "heads"  will  be  such  that 
a  microphone  or  headset,  earphone  or  loudspeaker  can  be 
attached  to  or  placed  near  the  head  as  would  be  the  case 
when  the  system  v;as  being  used  by  a  person.  The  "head"  at 
the  transmitter  end  of  the  system  would  provide  the  SCIM 
signal  acoustically  to  the  microphone  or  mouthpiece  of  the 
system  being  tested  and  the  "head"  at  the  receiving  end 
v/ould  pick  up  by  a  microphone  In  an  artificial  ear  on  the 
"head"  the  received  signal  for  analysis  purposes.  V/hen  the 
artificial  heads  are  used,  the  SCIM  will,  of  course,  measure 
the  total  system  ambient  noise  conditions  at  both  the  micro¬ 
phone  and  earphone  or  loudspeaker  positions,  as  well  as  the 
characteristics  of  the  microphone  and  earphone. 

3.  Provisions  will  be  provided  so  that  SCIM  can  be 
used  either  In  Its  present  form,  that  Is,  supplying  the  Input 
and  output  electrically  to  the  system  and/or  acoustically; 
thus  one  can  measure  the  performance  of  the  system.  Including 
the  microphone,  earphone  cind  ambient  noise  conditions  Inde¬ 
pendently  of  the  electrical  transmission  and  noise  character¬ 
istics  of  the  system. 
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4.  Certain  modifications  are  being  contemplated  that  will 
permit  SCIM  to  operate  in  time  in  a  more  automatic  fashion  than 
now  possible. 
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